AITopics | scrna-seq dataset

Collaborating Authors

scrna-seq dataset

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

scE2TM improves single-cell embedding interpretability and reveals cellular perturbation signatures

Chen, Hegang, Lu, Yuyin, Zhao, Yifan, Dai, Zhiming, Wang, Fu Lee, Li, Qing, Rao, Yanghui, Li, Yue

arXiv.org Artificial IntelligenceDec-8-2025

Single-cell RNA sequencing technologies have revolutionized our understanding of cellular heterogeneity, yet computational methods often struggle to balance performance with biological interpretability. Embedded topic models have been widely used for interpretable single-cell embedding learning. However, these models suffer from the potential problem of interpretation collapse, where topics semantically collapse towards each other, resulting in redundant topics and incomplete capture of biological variation. Furthermore, the rise of single-cell foundation models creates opportunities to harness external biological knowledge for guiding model embeddings. Here, we present scE2TM, an external knowledge-guided embedded topic model that provides a high-quality cell embedding and interpretation for scRNA-seq analysis. Through embedding clustering regularization method, each topic is constrained to be the center of a separately aggregated gene cluster, enabling it to capture unique biological information. Across 20 scRNA-seq datasets, scE2TM achieves superior clustering performance compared with seven state-of-the-art methods. A comprehensive interpretability benchmark further shows that scE2TM-learned topics exhibit higher diversity and stronger consistency with underlying biological pathways. Modeling interferon-stimulated PBMCs, scE2TM simulates topic perturbations that drive control cells toward stimulated-like transcriptional states, faithfully mirroring experimental interferon responses. In melanoma, scE2TM identifies malignant-specific topics and extrapolates them to unseen patient data, revealing gene programs associated with patient survival.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2507.08355

Country:

Asia > China (0.28)
North America > Canada (0.28)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.47)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

scAGC: Learning Adaptive Cell Graphs with Contrastive Guidance for Single-Cell Clustering

Li, Huifa, Fu, Jie, Zhuang, Xinlin, Yang, Haolin, Ling, Xinpeng, Cheng, Tong, xue, Haochen, Razzak, Imran, Chen, Zhili

arXiv.org Artificial IntelligenceAug-14-2025

Accurate cell type annotation is a crucial step in analyzing single-cell RNA sequencing (scRNA-seq) data, which provides valuable insights into cellular heterogeneity. However, due to the high dimensionality and prevalence of zero elements in scRNA-seq data, traditional clustering methods face significant statistical and computational challenges. While some advanced methods use graph neural networks to model cell-cell relationships, they often depend on static graph structures that are sensitive to noise and fail to capture the long-tailed distribution inherent in single-cell populations.To address these limitations, we propose scAGC, a single-cell clustering method that learns adaptive cell graphs with contrastive guidance. Our approach optimizes feature representations and cell graphs simultaneously in an end-to-end manner. Specifically, we introduce a topology-adaptive graph autoencoder that leverages a differentiable Gumbel-Softmax sampling strategy to dynamically refine the graph structure during training. This adaptive mechanism mitigates the problem of a long-tailed degree distribution by promoting a more balanced neighborhood structure. To model the discrete, over-dispersed, and zero-inflated nature of scRNA-seq data, we integrate a Zero-Inflated Negative Binomial (ZINB) loss for robust feature reconstruction. Furthermore, a contrastive learning objective is incorporated to regularize the graph learning process and prevent abrupt changes in the graph topology, ensuring stability and enhancing convergence. Comprehensive experiments on 9 real scRNA-seq datasets demonstrate that scAGC consistently outperforms other state-of-the-art methods, yielding the best NMI and ARI scores on 9 and 7 datasets, respectively.Our code is available at Anonymous Github.

artificial intelligence, graph, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2508.0918

Country:

North America > United States (0.46)
Asia > Middle East > UAE (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

GRIT: Graph-Regularized Logit Refinement for Zero-shot Cell Type Annotation

Hu, Tianxiang, Zhou, Chenyi, Liu, Jiaxiang, Wang, Jiongxin, Chen, Ruizhe, Xia, Haoxiang, Wang, Gaoang, Wu, Jian, Liu, Zuozhu

arXiv.org Artificial IntelligenceAug-8-2025

Cell type annotation is a fundamental step in the analysis of single-cell RNA sequencing (scRNA-seq) data. In practice, human experts often rely on the structure revealed by principal component analysis (PCA) followed by $k$-nearest neighbor ($k$-NN) graph construction to guide annotation. While effective, this process is labor-intensive and does not scale to large datasets. Recent advances in CLIP-style models offer a promising path toward automating cell type annotation. By aligning scRNA-seq profiles with natural language descriptions, models like LangCell enable zero-shot annotation. While LangCell demonstrates decent zero-shot performance, its predictions remain suboptimal, particularly in achieving consistent accuracy across all cell types. In this paper, we propose to refine the zero-shot logits produced by LangCell through a graph-regularized optimization framework. By enforcing local consistency over the task-specific PCA-based k-NN graph, our method combines the scalability of the pre-trained models with the structural robustness relied upon in expert annotation. We evaluate our approach on 14 annotated human scRNA-seq datasets from 4 distinct studies, spanning 11 organs and over 200,000 single cells. Our method consistently improves zero-shot annotation accuracy, achieving accuracy gains of up to 10%. Further analysis showcase the mechanism by which GRIT effectively propagates correct signals through the graph, pulling back mislabeled cells toward more accurate predictions. The method is training-free, model-agnostic, and serves as a simple yet effective plug-in for enhancing automated cell type annotation in practice.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2508.04747

Country: Europe (0.28)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

scDD: Latent Codes Based scRNA-seq Dataset Distillation with Foundation Model Knowledge

Yu, Zhen, Han, Jianan, Liu, Yang, Chen, Qingchao

arXiv.org Artificial IntelligenceMar-6-2025

Single-cell RNA sequencing (scRNA-seq) technology has profiled hundreds of millions of human cells across organs, diseases, development and perturbations to date. However, the high-dimensional sparsity, batch effect noise, category imbalance, and ever-increasing data scale of the original sequencing data pose significant challenges for multi-center knowledge transfer, data fusion, and cross-validation between scRNA-seq datasets. To address these barriers, (1) we first propose a latent codes-based scRNA-seq dataset distillation framework named scDD, which transfers and distills foundation model knowledge and original dataset information into a compact latent space and generates synthetic scRNA-seq dataset by a generator to replace the original dataset. Then, (2) we propose a single-step conditional diffusion generator named SCDG, which perform single-step gradient back-propagation to help scDD optimize distillation quality and avoid gradient decay caused by multi-step back-propagation. Meanwhile, SCDG ensures the scRNA-seq data characteristics and inter-class discriminability of the synthetic dataset through flexible conditional control and generation quality assurance. Finally, we propose a comprehensive benchmark to evaluate the performance of scRNA-seq dataset distillation in different data analysis tasks. It is validated that our proposed method can achieve 7.61% absolute and 15.70% relative improvement over previous state-of-the-art methods on average task.

dataset, scrna-seq dataset, synthetic dataset, (15 more...)

arXiv.org Artificial Intelligence

2503.04357

Country:

Asia > China > Beijing > Beijing (0.04)
Europe > Portugal > Braga > Braga (0.04)
Africa > Togo (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (0.68)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.68)
(2 more...)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

DP-DCAN: Differentially Private Deep Contrastive Autoencoder Network for Single-cell Clustering

Li, Huifa, Fu, Jie, Chen, Zhili, Yang, Xiaomin, Liu, Haitao, Ling, Xinpeng

arXiv.org Artificial IntelligenceNov-6-2023

Single-cell RNA sequencing (scRNA-seq) is important to transcriptomic analysis of gene expression. Recently, deep learning has facilitated the analysis of high-dimensional single-cell data. Unfortunately, deep learning models may leak sensitive information about users. As a result, Differential Privacy (DP) is increasingly used to protect privacy. However, existing DP methods usually perturb whole neural networks to achieve differential privacy, and hence result in great performance overheads. To address this challenge, in this paper, we take advantage of the uniqueness of the autoencoder that it outputs only the dimension-reduced vector in the middle of the network, and design a Differentially Private Deep Contrastive Autoencoder Network (DP-DCAN) by partial network perturbation for single-cell clustering. Since only partial network is added with noise, the performance improvement is obvious and twofold: one part of network is trained with less noise due to a bigger privacy budget, and the other part is trained without any noise. Experimental results of six datasets have verified that DP-DCAN is superior to the traditional DP scheme with whole network perturbation. Moreover, DP-DCAN demonstrates strong robustness to adversarial attacks. The code is available at https://github.com/LFD-byte/DP-DCAN.

dataset, differential privacy, dp-dcan, (11 more...)

arXiv.org Artificial Intelligence

2311.0341

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States > New York > New York County > New York City (0.04)
Asia > Japan > Kyūshū & Okinawa > Kyūshū > Fukuoka Prefecture > Fukuoka (0.04)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Add feedback

Dimension Reduction for Data with Heterogeneous Missingness

Ling, Yurong, Liu, Zijing, Xue, Jing-Hao

arXiv.org Machine LearningSep-27-2021

Dimension reduction plays a pivotal role in analysing high-dimensional data. However, observations with missing values present serious difficulties in directly applying standard dimension reduction techniques. As a large number of dimension reduction approaches are based on the Gram matrix, we first investigate the effects of missingness on dimension reduction by studying the statistical properties of the Gram matrix with or without missingness, and then we present a bias-corrected Gram matrix with nice statistical properties under heterogeneous missingness. Extensive empirical results, on both simulated and publicly available real datasets, show that the proposed unbiased Gram matrix can significantly improve a broad spectrum of representative dimension reduction approaches.

dataset, matrix, visualisation, (16 more...)

arXiv.org Machine Learning

2109.11765

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
North America > United States > Virginia > Arlington County > Arlington (0.04)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback